Cluster-SkePU: A Multi-Backend Skeleton Programming Library for GPU Clusters
نویسندگان
چکیده
SkePU is a C++ template library with a simple and unified interface for expressing data parallel computations in terms of generic components, called skeletons, on multi-GPU systems using CUDA and OpenCL. The smart containers in SkePU, such as Matrix and Vector, perform data management with a lazy memory copying mechanism that reduces redundant data communication. SkePU provides programmability, portability and even performance portability, but up to now application written using SkePU could only run on a single multi-GPU node. We present the extension of SkePU for GPU clusters without the need to modify the SkePU application source code. With our prototype implementation, we performed two experiments. The first experiment demonstrates the scalability with regular algorithms for N-body simulation and electric field calculation over multiple GPU nodes. The results for the second experiment show the benefit of lazy memory copying in terms of speedup gained for one level of Strassen’s algorithm and another synthetic matrix sum application.
منابع مشابه
Towards a Tunable Multi-Backend Skeleton Programming Framework for Multi-GPU Systems
SkePU is a C++ template library that provides a simple and unified interface for specifying data-parallel computations with the help of skeletons on GPUs using CUDA and OpenCL. The interface is also general enough to support other architectures, and SkePU implements both a sequential CPU and a parallel OpenMP backend. It also supports multi-GPU systems. Currently available skeletons in SkePU in...
متن کاملFlexible Runtime Support for Efficient Skeleton Programming on Heterogeneous GPU-based Systems
SkePU is a skeleton programming framework for multicore CPU and multi-GPU systems. StarPU is a runtime system that provides dynamic scheduling and memory management support for heterogeneous, accelerator-based systems. We have implemented support for StarPU as a possible backend for SkePU while keeping the generic SkePU interface intact. The mapping of a SkePU skeleton call to one or more StarP...
متن کاملAdaptive Implementation Selection in the SkePU Skeleton Programming Library
In earlier work, we have developed the SkePU skeleton programming library for modern multicore systems equipped with one or more programmable GPUs. The library internally provides four types of implementations (implementation variants) for each skeleton: serial C++, OpenMP, CUDA and OpenCL targeting either CPU or GPU execution respectively. Deciding which implementation would run faster for a g...
متن کاملToward optimised skeletons for heterogeneous parallel architecture with performance cost model
High performance architectures are increasingly heterogeneous with shared and distributed memory components, and accelerators like GPUs. Programming such architectures is complicated and performance portability is a major issue as the architectures evolve. This thesis explores the potential for algorithmic skeletons integrating a dynamically parametrised static cost model, to deliver portable p...
متن کاملParallelizing the LM OSEM Image Reconstruction on Multi-Core Clusters
In this paper we present four different parallel implementations of the popular LM OSEM medical image reconstruction algorithm. While two of them use libraries such as MPI, OpenMP, or Threading Building Blocks (TBB) directly, the other two implementations use algorithmic skeletons of the Münster Skeleton Library Muesli to hide the parallelism. We compare the implementations w.r.t. runtime, effi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013